Input Sentence Splitting and Translating

نویسندگان

  • Takao Doi
  • Eiichiro Sumita
چکیده

We propose a method to split and translate input sentences for speech translation in order to overcome the long sentence problem. This approach is based on three criteria used to judge the goodness of translation results. The criteria utilize the output of an MT system only and assumes neither a particular language nor a particular MT approach. In an experiment with an EBMT system, in which prior methods cannot work or work badly, the proposed split-and-translate method achieves much better results in translation quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Splitting Long Input Sentences for Phrase-based Statistical Machine Translation

Translation results suffer when a standard phrasebased statistical machine translation system is used for translating long sentences. The translation output will not have the same word order as the source. When a sentence is long, it should be partitioned into several clauses, and the word reordering during the translation done within these clauses, not between the clauses. In this paper, we pr...

متن کامل

Splitting Input Sentence for Machine Translation Using Language Model with Sentence Similarity

In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input sentence appears promising. In previous research, many methods used N-gram clues to split sentences. In this paper, to supplement N-gram based splitting methods, we introduce another clue using sentence similarity based on edit-distance. In our splitting method, we ge...

متن کامل

Description for IWSLT 2010

Our submission is a non-structural Example-Based Machine Translation system that translates text from Arabic to English, using a parallel corpus aligned at the paragraph / sentence level. Each new input sentence is fragmented into phrases and those phrases are matched to example patterns, using various levels of morphological information. Source-language synonyms were derived automatically and ...

متن کامل

System Description for IWSLT 2010

Our submission is a non-structural Example-Based Machine Translation system that translates text from Arabic to English, using a parallel corpus aligned at the paragraph / sentence level. Each new input sentence is fragmented into phrases and those phrases are matched to example patterns, using various levels of morphological information. Source-language synonyms were derived automatically and ...

متن کامل

Splitting Long or Ill-formed Input for Robust Spoken-language Translation

This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing, and does not degrade translation e ciency. The complete translation result is formed by concatenat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003